Analyzing Carbon Footprints in SQL

Code
import pandas as pd
import sqlite3
import sql
from itables import init_notebook_mode

Greenhouse gas emissions attributable to products—from food to sneakers to appliances—make up more than 75% of global emissions.

Cite: The Carbon Catalogue

The dataset, which is publicly availably on nature.com, stores product carbon footprints (PCFs) for various companies. PCFs are the greenhouse gas emissions attributable to a given product, measured in CO2e (carbon dioxide equivalent).

Database contents

Our database contains one table, product_emissions, which looks at PCFs by product as well as the stage of production these emissions occured in.

field data_type
id VARCHAR
year INT
product_name VARCHAR
company VARCHAR
country VARCHAR
industry_group VARCHAR
weight_kg NUMERIC
carbon_footprint_pcf NUMERIC
upstream_percent_total_pcf VARCHAR
operations_percent_total_pcf VARCHAR
downstream_percent_total_pcf VARCHAR
Code
init_notebook_mode(all_interactive=True)
Code
df = pd.read_csv("product_emissions.csv") 
conn = sqlite3.connect(":memory:")
Code
df.to_sql("product_emissions", conn, index=False, if_exists="replace") 
866
Code
# Load the SQL extension
%load_ext sql

# Register the existing in-memory SQLite connection in jupysql

%sql conn --alias prod_emission

1: Coca-Cola’s emissions

First, let’s look at a small subset of the data: emissions reported by Coca-Cola. Coke is actually made up of multiple companies around the globe, so we’ll make sure our query returns data for any company name that starts with “Coca-Cola”. Coke used to report for every single different product it has, so we will limit the results to six.

  • Select all fields from the product_emissions table where the company name begins with “Coca-Cola”, limiting to the first six results.
Code
## Select all fields where the company name is Coca-Cola, limiting to the first six results
Code
%%sql
SELECT *
FROM product_emissions
WHERE company LIKE 'Coca-Cola%'
LIMIT 6
Running query in 'prod_emission'
index id year product_name company country industry_group weight_kg carbon_footprint_pcf upstream_percent_total_pcf operations_percent_total_pcf downstream_percent_total_pcf
468 22710-1-2014 2014 Coca-Cola (all packaging and sizes) Coca-Cola HBC AG Switzerland Food, Beverage & Tobacco 1.093 0.1673 84.28% 11.12% 4.60%
469 22710-1-2015 2015 Coca-Cola (all packaging and sizes) Coca-Cola HBC AG Switzerland Food & Beverage Processing 1.093 0.158 38.37% 12.71% 48.93%
470 22710-1-2016 2016 Coca-Cola (all packaging and sizes) Coca-Cola HBC AG Switzerland Food, Beverage & Tobacco 1.093 0.147 42.17% 10.98% 46.85%
588 3565-10-2013 2013 Coke Zero 330 ml glass bottle Coca-Cola Enterprises, Inc. USA Food, Beverage & Tobacco 0.541 0.34 73.95% 3.42% 22.63%
589 3565-11-2013 2013 Coke Zero 500ml PET Coca-Cola Enterprises, Inc. USA Food, Beverage & Tobacco 0.541 0.22 52.09% 12.32% 35.59%
590 3565-1-2013 2013 Coca-Cola 300ml can Coca-Cola Enterprises, Inc. USA Food, Beverage & Tobacco 0.3 0.17 69.76% 7.00% 23.24%

A little background:

  • Upstream emissions: emissions that occur before the company’s own operations such as emissions created by manufacturing bottles that Coke buys from suppliers
  • Operations emissions: emissions that the company creates directly, such as when Coke is bottling its product
  • Downstream emissions: emissions that occur after the product leaves the company, such as after Coke has sold drinks to McDonald’s

2: Most recent data

We’ll focus on recent emissions data during this code-along. When was the most recent data collected?

  • Return the most recent year for which data was collected.
Code
%%sql
-- Return the most recent year for which data was collected
SELECT MAX(year)
FROM product_emissions;
Running query in 'prod_emission'
MAX(year)
2017

3: Targeting major emitters

What are the industries with the most emissions in 2017 (the most recent year that data is available)?

  • Return the industry_group and a rounded total of carbon_footprint_pcf for each industry, aliasing as total_industry_footprint.
  • Limit to data for 2017 and order by total_industry_footprint.
Code
%%sql
-- Return industry_group and a rounded total of carbon_footprint_pcf, aliased as total_industry_footprint
-- Limit to data for 2017 and order by total_industry_footprint
SELECT industry_group, ROUND(SUM(carbon_footprint_pcf), 2) AS total_industry_footprint
FROM product_emissions
WHERE year = 2017
GROUP BY industry_group
ORDER BY SUM(carbon_footprint_pcf) DESC;
Running query in 'prod_emission'
industry_group total_industry_footprint
Materials 107129.0
Capital Goods 94942.67
Technology Hardware & Equipment 21865.09
Food, Beverage & Tobacco 3161.47
Commercial & Professional Services 740.6
Software & Services 690.0

4: Industry representation

It looks like the Materials industry had a huge carbon footprint in 2017. But what if that’s just because there are many companies from the Materials industry in the dataset? Let’s check which industries are most heavily represented in that year.

  • Return each industry_group included in the table and a count of the number of records that list that industry group.
  • Limit the results to only those from 2017 and lias the count as count_industry.
  • Order by count_industry, descending.
Code
%%sql
-- Return the industry groups and a count of the number of records for each group
-- Limit the results to only those from 2017 and alias the count as count_industry
-- Order by count_industry, descending
SELECT industry_group, COUNT(*) AS count_industry
FROM product_emissions
WHERE year = 2017
GROUP BY industry_group
ORDER BY COUNT(industry_group) DESC;
Running query in 'prod_emission'
industry_group count_industry
Technology Hardware & Equipment 22
Food, Beverage & Tobacco 22
Materials 11
Capital Goods 4
Commercial & Professional Services 2
Software & Services 1

5: Capital Goods industry

We can see that the Materials industry is the biggest emitter despite having less representation in our dataset than several other industries—yikes! The Capital Goods industry looks similar. Let’s explore the companies and products reporting for 2017 in the Capital Goods industry.

  • Return industry_group, company, and product_name for all records reporting in the Capital Goods industry during 2017.
Code
%%sql
-- Return industry_group, company, and product_name for all records reporting in the Capital Goods industry during 2017
SELECT industry_group, company, product_name
FROM product_emissions
WHERE industry_group = 'Capital Goods' 
    and year = 2017;
Running query in 'prod_emission'
industry_group company product_name
Capital Goods Mitsui Mining & Smelting Co., Ltd. Zinc Oxide
Capital Goods Daikin Industries, Ltd. Residential Air Conditioner
Capital Goods Daikin Industries, Ltd. Commercial Air Conditioner
Capital Goods Daikin Industries, Ltd. Light commercial Air Conditioner

6: Capital Goods lifecycle emissions

Daikin is an air conditioning and refrigeration manufacturer. Let’s look at emissions throughout the life cycle of Daikin products. Can you guess whether most emissions are upstream, downstream, or during operations?

  • Return product_name, company, upstream_percent_total_pcf, operations_percent_total_pcf, and downstream_percent_total_pcf for Daikin Industries, Ltd. in 2017.
Code
%%sql
-- Return product_name, company, and all stages of pcf emissions for Daikin in 2017
SELECT product_name, 
    company,
    upstream_percent_total_pcf, 
    operations_percent_total_pcf, 
    downstream_percent_total_pcf
FROM product_emissions
WHERE year = 2017
    and company = 'Daikin Industries, Ltd.'
Running query in 'prod_emission'
product_name company upstream_percent_total_pcf operations_percent_total_pcf downstream_percent_total_pcf
Residential Air Conditioner Daikin Industries, Ltd. 3.96% 0.55% 95.50%
Commercial Air Conditioner Daikin Industries, Ltd. 0.98% 0.13% 98.88%
Light commercial Air Conditioner Daikin Industries, Ltd. 0.70% 0.10% 99.21%

Exploring visually with plotly express

Let’s do some quick visualizations with plotly.express, a free an open-source graphing library. We don’t assume any knowledge of plotly on your part!

Code
# Import the plotly express package
import plotly.express as px

7: Country representation

Let’s take a look at emissions by country. You may have noticed that each time we run a query, the query results are available as df. So if we were to select all information from product_emissions, that would create a DataFrame called df that we can use in a plotly visualization! Let’s try it.

  • Select all information from product_emissions.
Code
%%sql --save company_country

SELECT country, COUNT(*) AS count
FROM product_emissions
GROUP BY country
ORDER BY count DESC
Running query in 'prod_emission'
country count
USA 305
Japan 110
Germany 67
Taiwan 60
Netherlands 35
Finland 35
United Kingdom 32
Switzerland 28
Sweden 26
Italy 23
Truncated to displaylimit of 10.

Let’s look at country representation in our dataset by creating a bar chart using the country column of df.

  • Create a plotly bar chart of the country distribution for companies in our dataset.
Code
query = %sql SELECT country, count FROM company_country
prod_emissions = pd.DataFrame(query)
prod_emissions.columns = ['country', 'count']
prod_emissions
Generating CTE with stored snippets: 'company_country'
Running query in 'prod_emission'
country count
Loading ITables v2.2.4 from the init_notebook_mode cell... (need help?)
Code
# Create a plotly bar chart of the country distribution for companies in our dataset
fig = px.bar(prod_emissions, x="country", y="count", title="Country Distribution for Companies")
fig.show()
Unable to display output for mime type(s): application/vnd.plotly.v1+json

8: Emissions by country

How does this compare to the emissions of companies reporting from each country?

  • First, get the results of interest using SQL: grouping by country, select country and the sum of total carbon_footprint_pcf by country, aliasing as total_country_footprint.
Code
%%sql --save country_footprint
-- Group by country
-- Select country and the sum of total carbon_footprint_pcf by country, aliasing as total_country_footprint
SELECT country, SUM(carbon_footprint_pcf) AS total_country_footprint
FROM product_emissions
GROUP BY country
ORDER BY total_country_footprint DESC;
Running query in 'prod_emission'
country total_country_footprint
Spain 9786126.23
Germany 2251224.92482
Japan 519344.3134
USA 451869.1689
Brazil 167587.74467
Luxembourg 167007.3
South Korea 140994.148
Netherlands 70415.34
Taiwan 61513.068
India 24574.0
Truncated to displaylimit of 10.
  • Create a plotly bar plot of the emissions by country in our dataset.
Code
query1 = %sql SELECT country, total_country_footprint FROM country_footprint
footprint_df = pd.DataFrame(query1)
footprint_df.columns = ['country', 'total_country_footprint']
footprint_df
Generating CTE with stored snippets: 'country_footprint'
Running query in 'prod_emission'
country total_country_footprint
Loading ITables v2.2.4 from the init_notebook_mode cell... (need help?)
Code
# Create a plotly bar plot of the emissions by country in our dataset
px.bar(footprint_df, x="country", y="total_country_footprint")
Unable to display output for mime type(s): application/vnd.plotly.v1+json

9: Does the graph above make sense?

Wow! Spain has a lot of emissions! Where do they come from? To finish our exploration together, let’s take a quick look at the underlying data as a gut-check.

  • In SQL, select company and carbon_footprint_pcf for companies in Spain.
Code
%%sql
-- Select company and carbon_footprint_pcf for companies in Spain
SELECT company, carbon_footprint_pcf
FROM product_emissions
WHERE country = 'Spain'
ORDER BY carbon_footprint_pcf DESC;
Running query in 'prod_emission'
company carbon_footprint_pcf
Gamesa Corporación Tecnológica, S.A. 3718044.0
Gamesa Corporación Tecnológica, S.A. 3276187.0
Gamesa Corporación Tecnológica, S.A. 1532608.0
Gamesa Corporación Tecnológica, S.A. 1251625.0
Compañía Española de Petróleos, S.A.U. CEPSA 6109.0
Compañía Española de Petróleos, S.A.U. CEPSA 890.0
Crimidesa 180.0
Agraz 155.78
Agraz 155.71
Crimidesa 140.0
Truncated to displaylimit of 10.

Gamesa Corporación Tecnológica is actually a renewable energy company specializing in wind power! Why might it have such high emissions?

Gamesa Corporación Tecnológica, despite being a leading wind energy company, may still have high CO₂ emissions due to several factors. The production of wind turbines involves energy-intensive materials like steel and concrete, while transportation and installation rely on fossil fuel-powered ships, trucks, and cranes. Additionally, if its manufacturing facilities use electricity from fossil fuel-based grids, indirect emissions can be significant. Maintenance, especially for offshore wind farms, often requires diesel-powered vehicles or helicopters, further adding to emissions. Lastly, the decommissioning and recycling of turbine components, particularly composite blades, is an energy-intensive process. However, despite these emissions, wind power still has a much lower carbon footprint compared to fossil fuel-based energy sources.

Back to top